This notebook documents preliminary analysis of tracking data for fish tagged in Molokini Crater between 2020-05-16 and 2021-05-24.
The purpose of this study is to understand how human impacts affect the fish of Molokini Crater
We are particularly interested in answering the following hypotheses: 1. Is the presence of fish affected by vessel presence
Proposed Approach: 1. Begin by calculating the number of each species tagged and basic summary statistics 2. Calculate Metrics - Receiver Use - Pianka’s Niche Overlap - residency 3. Make the following plots - Map - Receiver locations - Map - Average receiver use by Species - Scatterplot - day night plots - Bar Plot - The number of detections per day (individual) - Bar Plot - The number of individuals detected (species) - Line Chart - The proportion of individuals detected n days after tagging (30 day moving average by species) - Bar Plot - Daily vessel traffic - Scatter Plot - vessel traffic vs. proportion of fish detected in crater daily (scatterplot by species) 4. Perform the following statistical Tests - Compare Residency Rates by Species - Compare residency by species, size, and time at liberty - Create a GLM comparing # of individuals in crater regressed against boat traffic and species using AR(1) term on dependent variable on some time scale (daily? 6 hours? depends on resolution of vessel data)
project_directory = '/Users/stephenscherrer/Documents/Programming/Projects/Molokini'
scripts_directory = file.path(project_directory, 'Analysis Scripts')
data_directory = file.path(project_directory, 'Data')
results_directory = file.path(project_directory, 'Results')
figure_directory = file.path(results_directory, 'Figures')
source(file.path(scripts_directory, 'Utility Functions.R'))
## Files from VUE
molo_df = load_vemco_data(file.path(data_directory, 'VUE_Export.csv'))
false_detections_df = load_fdf_report(file.path(data_directory, 'FDA.csv'))
## Vessel Traffic data
vessel_df =
## Metadata Files
tagging_df = load_tagging_data(file.path(data_directory, ))
## temp
tagging_df = data.frame(tag_id = detections_per_tag_per_receiver$tag_id, species = sample(c('a', 'b', 'c'), size = nrow(detections_per_tag_per_receiver), replace = T))
receiver_df = load_receiver_data(file.path(data_directory, ))
## Associate detections with time of day
molo_df = get_time_of_day(molo_df)
## Remove irrelevant tags
# molo_df = molo_df[molo_df$full_tag_id %in% tagging_df$vem_tag_id, ]
## Filter false detections
molo_df = filter_false_detections(molo_df)
tags_by_species = aggregate(tag_id ~ species, data = tagging_df, FUN = uniqueN)
colnames(tags_by_species) = c('species', 'n_tagged')
print(tags_by_species)
# Time at liberty
time_at_liberty = calculate_time_at_liberty(molo_df)
# Days Detected
days_detected = calculate_days_detected(molo_df)
# % of days detected
detection_stats = merge(x = days_detected, y = time_at_liberty[ ,c('tag_id', 'days_at_liberty')], on.x = 'tag_id', on.y = 'tag_id')
detection_stats$percent_days_detected = round(detection_stats$unique_days / detection_stats$days_at_liberty, 4) * 100
# Merge with tagging data to get fish info
detection_stats = merge(x = tagging_df[ ,c('tagging_datetime', 'species', 'tag_id', 'fork_length_cm', )], y = detection_stats, on.x = 'tag_id', on.y = 'tag_id')
detection_stats = detection_stats[ ,order(detection_stats$species, detection_stats$tagging_datetime, detection_stats$tag_id)]
print(detection_stats)
0 = no overlap, 1 = perfect overlap
Study Area
## Make species plots for receiver use
for(species in species_receiver_use$species){
receiver_use_by_spp = ggmap(molo_basemap) +
geom_point(data = species_receiver_use[species_receiver_use$species == species, ],
mapping = aes(x = lon, y = lat, color = 'red', size = receiver_use)) +
labs(x = '°Longitude', y = '°Latitude') +
ggsave(filename = paste('Receiver Use by ', species, '.pdf', sep = ''), path = figure_directory)
print(receiver_use_by_spp)
}
Saving 7 x 7 in image
### Day Night Plots
## For all fish
plot_day_night(molo_df, plot_title = 'All Fish')
## By Species
for (spp in unique(tagging_df$species)){
plot_day_night(molo_df[molo_df$tag_id == tagging_df$tag_id[tagging_df$species == spp], ], plot_title = spp)
}
## By Individual
for (tag_id in molo_df$tag_id){
plot_day_night(molo_df[molo_df$tag_id == tag_id, ], plot_title = paste(tagging_df$species[tagging_df$tag_id == tag_id], '- Tag', as.character(tag_id), sep = ' '))
}
ggplot(data = detections_per_day_df, mapping = aes_string(x = 'date', y = column)) +
geom_bar(stat = "identity") + labs(title = paste('Tag ', strsplit(x = column, split = '_')[[1]][2], sep = ''), x = 'Date', y = 'Detections') + ggsave(filename = paste('Daily Detection Barplot -', column, '.pdf'), path = figure_directory)
Saving 7 x 7 in image
## Convert detections_per_day to presence/absence
presence_absence_wide_df = detections_per_day_df
for (i in 2:ncol(presence_absence_wide_df)){
presence_absence_wide_df[ ,i] = as.numeric(presence_absence_wide_df[ ,i] > 0)
}
## Convert from wide to long format
presence_absence_long_df = melt(presence_absence_wide_df, id.vars = c('date'), measure.vars = colnames(presence_absence_wide_df)[2:ncol(presence_absence_wide_df)], variable.name = 'tag_id', value.name = 'detected')
# Drop 'tag_' prefix from tag_id column for matching purposes
presence_absence_long_df$tag_id = levels(presence_absence_long_df$tag_id)[presence_absence_long_df$tag_id]
for(i in 1:nrow(presence_absence_long_df)){
presence_absence_long_df$tag_id[i] = strsplit(presence_absence_long_df$tag_id[i], split = '_')[[1]][2]
}
## Merge with species from tagging data
presence_absence_long_df = merge(x = presence_absence_long_df, y = tagging_df[ ,c('tag_id', 'species')], on = 'tag_id')
## Drop date and tag pairs preceding the date the fish was tagged
indicies_to_drop = c()
for(i in nrow(presence_absence_long_df)){
if(as.Date(tagging_df$datetime[tagging_df$tag_id == presence_absence_long_df$tag_id[i]]) <= presence_absence_long_df$date[i]){
indicies_to_drop = c(indicies_to_drop, i)
}
}
presence_absence_long_df = presence_absence_long_df[-indicies_to_drop, ]
## Get a list of active tags by date and species
active_tags_by_date = aggregate(tag_id ~ date + species, data = presence_absence_long_df, FUN = uniqueN)
colnames(active_tags_by_date) = c('date', 'species', 'deployed_tags')
## Standardize tag counts by tags deployed and plot as % of tags detected per day by species
for(species in unique(presence_absence_long_df$species)){
# Count number of tags detected daily by species
presence_absence_by_spp_df = aggregate(detected~date, data = presence_absence_long_df[presence_absence_long_df$species == species, ], FUN = sum)
colnames(presence_absence_by_spp_df) = c('date', 'tags_detected')
# Standardize daily tag count by the number of tags deployed
presence_absence_by_spp_df = merge(x = presence_absence_by_spp_df, y = active_tags_by_date[active_tags_by_date$species == species, ], on = 'date')
presence_absence_by_spp_df$percent_tags_detected = presence_absence_by_spp_df$detected / presence_absence_by_spp_df$deployed_tags
# Make plot at species level
ggplot(data = presence_absence_by_spp_df, mapping = aes(x = date, y = percent_tags_detected)) +
geom_bar(stat = 'identity') +
labs(title = species, x = 'Date', y = '% of tags detected') +
ggsave(filename = paste('Detections Standardized By Species - ', species, '.pdf', sep = ''), path = figure_directory)
}
### LOGIC HERE TO GET TO # BOATS / DAY
## Get max_vessels at any given time, total_vessels
# Make plot for max_vessels
max_vessels_plot = ggplot(data = vessels_per_day, mapping = aes(x = date, y = max_vessels)) +
geom_bar(stat = 'identity') +
labs(title = 'Maximum Number of Co-occuring Vessels Daily', x = 'Date', y = '# of Vessels') +
ggsave(filename = paste('Maximum Number of Co-occuring Vessels Daily.pdf ', species, '.pdf', sep = ''), path = figure_directory)
# Make plot for tptal_vessels
total_vessels_plot = ggplot(data = vessels_per_day, mapping = aes(x = date, y = total_vessels)) +
geom_bar(stat = 'identity') +
labs(title = 'Maximum Number of Co-occuring Vessels Daily', x = 'Date', y = '# of Vessels') +
ggsave(filename = paste('Total Vessels Daily.pdf ', species, '.pdf', sep = ''), path = figure_directory)
print(max_vessels_plot)
print(total_vessels_plot)
Calculate mean residency by spp (irregardless of time), then ANOVA by spp Use Tukey’s HSD to determine significance
## Tukey's Honestly Significant Differences between species
TukeyHSD(residence_by_species_anova)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = residence_metric ~ species, data = detection_stats)
$species
diff lwr upr p adj
b-a -0.1864600 -0.548699444 0.1757795 0.4333781
c-a 0.1765415 -0.229063083 0.5821461 0.5481175
c-b 0.3630015 -0.006985495 0.7329885 0.0555247
GLM comparing size and residency time by spp independent var (size, time at liberty) dependent (residency index)
summary(species_glm)
Call:
glm(formula = residence_metric ~ species + days_at_liberty *
species, family = binomial(logit), data = detection_stats)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7145 -0.7787 0.4530 0.7932 2.2305
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.644440 0.931881 1.765 0.0776 .
speciesb -0.662091 1.160109 -0.571 0.5682
speciesc 0.585109 1.614362 0.362 0.7170
days_at_liberty -0.007911 0.004178 -1.893 0.0583 .
speciesb:days_at_liberty -0.001790 0.005936 -0.302 0.7630
speciesc:days_at_liberty 0.003451 0.006198 0.557 0.5776
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 59.071 on 51 degrees of freedom
Residual deviance: 41.195 on 46 degrees of freedom
AIC: 63.294
Number of Fisher Scoring iterations: 4